Data used in biostatistics are often collected in online databases, but some data are still collected on

paper. Regardless of the source of the data, they must be put into electronic format and arranged in a

certain way to be able to be analyzed using statistical software. Chapter 8 is devoted to describing

how to get your data into the computer and arrange it properly so it can be analyzed correctly. It also

describes how to collect and validate your data. Then in Chapter 9, we show you how to summarize

each type of data and display it graphically. We explain how to make bar charts, box-and-whiskers

charts, and more.

Drawing Conclusions from Your Data

Most statistical analysis involves inferring, or drawing conclusions about the population at large

based on your observations of a sample drawn from that population. The theory of statistical

inference is often divided into two broad sub-theories: estimation theory and decision theory.

Statistical estimation theory

Chapter 10 deals with statistical estimation theory, which addresses the question of how accurately

and precisely you can estimate a population parameter from the values you observe in your sample.

For example, you may want to estimate the mean blood hemoglobin concentration in adults with Type

II diabetes, or the true correlation coefficient between body weight and height in certain pediatric

populations. Chapter 10 describes how to estimate these parameters by constructing a confidence

interval around your estimate. The confidence interval is the range that is likely to include the true

population parameter, which provides an idea of the precision of your estimate.

Statistical decision theory

Much of the rest of this book deals with statistical decision theory, which is how to decide whether

some effect you’ve observed in your data reflects a real difference or association in the background

population or is merely the result of random fluctuations in your data or sampling. If you measure the

mean blood hemoglobin concentration in two different samples of adults with Type II diabetes, you

will likely get a different number. But does this difference reflect a real difference between the groups

in terms of blood hemoglobin concentration? Or is this difference a result of random fluctuations?

Statistical decision theory helps you decide.

In Part 4, we cover statistical decision theory in terms of comparing means and proportions between

groups, as well as understanding the relationship between two or more variables.

Comparing groups

In Part 4, we show you different ways to compare groups statistically.

In Chapter 11, you see how to compare average values between two or more groups by using t

tests and ANOVAs. We also describe their nonparametric counterparts that can be used with

skewed or other non-normally distributed data.

Chapter 12 shows how to compare proportions between two or more groups, such as the

proportions of patients responding to two different drugs, using the chi-square and Fisher Exact

tests on cross-tabulated (cross-tab) data.

Chapter 13 focuses on one specific kind of cross-tab called the fourfold table, which has exactly